首页> 外文OA文献 >Accelerating Large Scale Centroid-based Clustering with Locality Sensitive Hashing

【2h】

Accelerating Large Scale Centroid-based Clustering with Locality Sensitive Hashing

机译：通过局部敏感哈希加速大规模基于质心的聚类

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most traditional data mining algorithms struggle to cope with the sheer scale of data efficiently. In this paper, we propose a general framework to accelerate existing clustering algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters. Our framework makes use of locality sensitive hashing (LSH) to significantly reduce the cluster search space. We also theoretically prove that our framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster, and we adopt the popular K-Modes categorical clustering algorithm to present how the framework can be applied. We validated our framework with five synthetic datasets and a real world Yahoo! Answers dataset. The experimental results demonstrate that our framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity.

机译：大多数传统的数据挖掘算法都难以有效地应对庞大的数据规模。在本文中，我们提出了一个通用框架来加速现有的聚类算法，以聚类包含大量属性，项目和聚类的大规模数据集。我们的框架利用局部敏感哈希（LSH）来大大减少集群搜索空间。我们还从理论上证明，就聚类质量而言，我们的框架具有有保证的错误范围。该框架可以应用于将对象分配给最相似的群集的一组基于质心的聚类算法，并且我们采用流行的K-Modes分类聚类算法来展示如何应用该框架。我们使用五个综合数据集和真实世界的Yahoo!验证了我们的框架。答案数据集。实验结果表明，我们的框架能够在因子2和6之间加速现有的聚类算法，同时保持相当的簇纯度。

著录项

作者
McConville, Ryan; Cao, Xin; Liu, Weiru; Miller, Paul;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing [J] . Cao, Yiqun, Jiang, Tao, Girke, Thomas Bioinformatics . 2010,第7期

机译：通过几何嵌入和局部敏感哈希来加速大型化合物集的相似度搜索和聚类
2. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing [J] . Thomas Girke Bioinformatics . 2010,第7期

机译：通过几何嵌入和局部敏感哈希来加速大型化合物集的相似性搜索和聚类
3. Toward more efficient locality-sensitive hashing via constructing novel hash function cluster [J] . Zhang Shi, Huang Jin, Xiao Ruliang, Concurrency and computation: practice and experience . 2021,第20期

机译：通过构建新的哈希函数群集来朝着更有效的地区敏感散列
4. Accelerating large scale centroid-based clustering with locality sensitive hashing [C] . Ryan McConville, Xin Cao, Weiru Liu, IEEE International Conference on Data Engineering . 2016

机译：通过局部敏感哈希加速大规模基于质心的聚类
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing [O] . Yiqun Cao, Tao Jiang, Thomas Girke -1

机译：通过几何嵌入和局部敏感哈希来加速大型化合物集的相似性搜索和聚类
7. Accelerating Large Scale Centroid-based Clustering with Locality Sensitive Hashing [O] . McConville Ryan, Cao Xin, Liu Weiru, 2016

机译：通过局部敏感哈希加速大规模基于质心的聚类

Accelerating Large Scale Centroid-based Clustering with Locality Sensitive Hashing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅